Data Mining - An Industrial Research Perspective

نویسنده

  • C. Apte
چکیده

Data mining has burst into the limelight recently, thanks to a series of key application successes [2]. Big business and industry has raised the stakes by investing in this nascent technology, and has laid down high expectations for this emerging area. Although the methods and systems for data mining are based upon and in uenced by years of classical work in statistics, pattern recognition, information theory, and machine learning, a combination of factors has caused the recent resurgence of interest. These factors include the availability of high volumes of on-line enterprise data, inexpensive access to high performance computational resources, and continuing impressive advances to the underlying data analysis algorithms. This article describes activities that center around this emerging area of technology, with a focus on research in progress, and applications being pursued. 1 Current State of the Art Just what exactly is data mining? At a broad level, it is the process by which one extracts accurate and previously unknown information from large volumes of data. This information should be in a form that can be understood, acted upon, and used for improving decision processes of the data owning entity. Obviously, with this de nition, data mining is a technology that encompasses a broad set of technologies, including data warehouses, database management, data analysis algorithms, and visualization. The crux of the appeal for this new technology lies in the data analysis algorithms, since they provide the automated mechanisms for sifting through these large volumes of data for extracting useful information. The analysis capability of these algorithms, coupled with today's data warehousing and database management technology, make it possible to mine and extract useful knowledge from very large business and industrial data. The data analysis algorithms (or data mining algorithms, as they are more popularly known nowadays) can be divided into three major categories based upon the nature of their information extraction. These categories are as follows; predictive modeling (aka classi cation or supervised learning), clustering (aka segmentation or unsupervised learning), and frequent pattern extraction. The data representation model for all these algorithms is quite straightforward. Data is considered to be a collection of records, where each record is a collection of elds. Using this tabular data model, the data mining algorithms are designed to operate on the contents, under di ering assumptions, and delivering results in di ering formats. Predictive modeling is based upon techniques used for classi cation and regression modeling. One eld in the tabular data set is pre-identi ed as the response or class variable, and these algorithms produce a model for that variables as a function of the other elds in the data set, pre-identi ed as the features or explanatory variables. If the response variable is discrete valued, then classi cation modeling is employed. If the response variable is continuous valued, regression modeling is employed. The principal problem being addresses by this family of algorithms is to be able to produce a predictively accurate function approximation for the response variable, by using the data set as examples of the relations between instances of explanatory variables and the response variable, in the presence of noise. Once produced, the model can be used to predict the value of a response variable, given the speci cations for the explanatory variables. This modeling work has it's roots in classical statistics [4, 8], although many recent advances have come from other areas, including pattern recognition, information theory, and machine learning. The important

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Proposed Data Mining Methodology and its Application to Industrial Procedures

Data mining is the process of discovering correlations, patterns, trends or relationships by searching through a large amount of data stored in repositories, corporate databases, and data warehouses. Industrial procedures with the help of engineers, managers, and other specialists, comprise a broad field and have many tools and techniques in their problem-solving arsenal. The purpose of this st...

متن کامل

Historical data analysis through data mining from an outsourcing perspective: The Three-phases method

The process of historical data analysis through data mining has proven to be of value for the industrial environment. There are many models available which describe the inhouse process of data mining. However, many companies either do not have the inhouse skills or do not wish to invest in performing inhouse data mining. This research investigates the applicability of two well-established data ...

متن کامل

Identification of the Patient Requirements Using Lean Six Sigma and Data Mining

Lean health care is one of new managing approaches putting the patient at the core of each change. Lean construction is based on visualization for understanding and prioritizing imporvments. By using only visualization techniques, so much important information could be missed. In order to prioritize and select improvements, it’s essential to integrate new analysis tools to achieve a good unders...

متن کامل

Combining data mining and group decision making in retailer segmentation based on LRFMP variables

Data mining is a powerful tool for firms to extract knowledge from their customers’ transaction data. One of the useful applications of data mining is segmentation. Segmentation is an effective tool for managers to make right marketing strategies for right customer segments. In this study we have segmented retailers of a hygienic manufacture. Nowadays all manufactures do understand that for st...

متن کامل

Data Mining: An Overview from Database Perspective

Mining information and knowledge from large databases has been recognized by many researchers as a key research topic in database systems and machine learning, and by many industrial companies as an important area with an opportunity of major revenues. Researchers in many di erent elds have shown great interest in data mining. Several emerging applications in information providing services, suc...

متن کامل

Proposing an approach to calculate headway intervals to improve bus fleet scheduling using a data mining algorithm

The growth of AVL (Automatic Vehicle Location) systems leads to huge amount of data about different parts of bus fleet (buses, stations, passenger, etc.) which is very useful to improve bus fleet efficiency. In addition, by processing fleet and passengers’ historical data it is possible to detect passenger’s behavioral patterns in different parts of the day and to use it in order to improve fle...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997